Introduce datatable.unique.names policy for duplicate handling in setnames() #4044#7674
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7674 +/- ##
=======================================
Coverage 99.04% 99.04%
=======================================
Files 87 87
Lines 17031 17049 +18
=======================================
+ Hits 16868 16886 +18
Misses 163 163 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
No obvious timing issues in HEAD=issue4044 Generated via commit 471ab20 Download link for the artifact containing the test results: ↓ atime-results.zip
|
| test(2367.6, fread(file(f)), data.table(), warning="Connection has size 0.") | ||
| unlink(f) | ||
|
|
||
| #4044 |
There was a problem hiding this comment.
please dont only write the issue number, but what issue is fixed. this provides information without needing to search for the issue on github.
The tests seem ok but an example of DT = data.table(a=1:2, b=c('x', 'y')) would have been easier to grasp (and also less typing/code)
|
|
||
| process_name_policy = function(names_vec) { | ||
| policy = getOption("datatable.unique.names") | ||
| if (is.null(policy) || policy == "off") return(names_vec) |
There was a problem hiding this comment.
no ultra happy with this double option for "off". I would either eliminate NULL or "off" (leaning towards eliminating NULL though)
| \item{\code{datatable.enlist}}{Experimental feature. Default is \code{NULL}. If set to a function | ||
| (e.g., \code{list}), the \code{j} expression can return a \code{list}, which will then | ||
| be "enlisted" into columns in the result.} | ||
| \item{\code{datatable.unique.names}}{A character string, default \code{NULL} (same as \code{"off"}). |
There was a problem hiding this comment.
we should also document the other options, e.g. warn, error etc.
| msg = paste0("Duplicate column names created: ", brackify(dups), ". This may cause ambiguity.") | ||
|
|
||
| switch(policy, | ||
| warn = warningf(msg), |
There was a problem hiding this comment.
see comment above from previous review. this will trip with
DT = data.table("a%d" = 1, b = 2)
options(datatable.unique.names = "warn")
setnames(DT, "b", "a%d")
# Error in sprintf(gettext(fmt, domain = domain, trim = trim), ...) :
# too few argumentsHence, you need to pass format strings like warningf("%s", msg)
|
|
||
| 5. `tables()` can now optionally report `data.table` objects stored one level deep inside list objects when `depth=1L`, [#2606](https://github.com/Rdatatable/data.table/issues/2606). Thanks @MichaelChirico for the report and @manmita for the PR | ||
|
|
||
| 6. `setnames()` now supports a global option `datatable.unique.names` to control the creation of duplicate column names. Users can choose between `"off"` (default), `"warn"`, `"error"`, or `"rename"`. This addresses long-standing ambiguity issues when duplicate names were created silently, [#4044](https://github.com/Rdatatable/data.table/issues/4044). Thanks to @venom1204 for the PR. |
There was a problem hiding this comment.
I dont think we need this part since its superfluous "This addresses long-standing ambiguity issues when duplicate names were created silently"
| table_name, brackify(duplicate_names), domain=NA) | ||
| } | ||
|
|
||
| process_name_policy = function(names_vec) { |
There was a problem hiding this comment.
Also since we only use it with setnames it should probably live inside data.table.R in front of setnames

Closes #4044
Hi @ben-schwen,
Apologies for opening a new PR—there was an issue with the previous one. I’ve incorporated the requested changes, while keeping the rest unchanged.
Kindly review it when you have the time. Also, sorry for the delay; I was occupied with other commitments.
Thank you!